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Abstract 

Annealed importance sampling (AIS) is a com¬ 
mon algorithm to estimate partition functions of 
useful stochastic models. One important problem 
for obtaining accurate AIS estimates is the selec¬ 
tion of an annealing schedule. Conventionally, an 
annealing schedule is often determined heuristi- 
cally or is simply set as a linearly increasing se¬ 
quence. In this paper, we propose an algorithm 
for the optimal schedule by deriving a functional 
that dominates the AIS estimation error and by 
numerically minimizing this functional. We ex¬ 
perimentally demonstrate that the proposed algo¬ 
rithm mostly outperforms conventional schedul¬ 
ing schemes with large quantization numbers. 


heuristic scheduling (Salakhutdinov & Murray, 2008; Des¬ 
jardins et al., 2013), and others simply use the linear sched¬ 
ule (Salakhutdinov & Hinton, 2009; Dauphin & Bengio, 
2013). Grosse et al. (2013) recently developed a schedul¬ 
ing algorithm but this algorithm failed to make remarkable 
improvements over the linear schedule. 

In this paper, we propose an alternative scheduling algo¬ 
rithm by formulating the problem as variational minimiza¬ 
tion of a functional that dominates the variance of esti¬ 
mates. We develop a numerical solver for the variational 
problem and implement an optimization scheme for an an¬ 
nealing schedule. We perform experiments on restricted 
Boltzmann machines (RBMs) and show that the proposed 
algorithm outperforms conventional scheduling schemes 
with a large number of quantization. 


1. Introduction 

A large number of useful stochastic models are defined us¬ 
ing unnormalized probability. Exact computation of nor¬ 
malizing constants, or partition functions, of such models 
is usually intractable. This poses a difficulty in comparing 
different models or training algorithms with respect to the 
probability that the models assign to validation data. Moti¬ 
vated by this problem, extensive research has been made on 
estimation of partition functions (Gelman & Meng, 1998; 
Neal, 2001; Yedidia et al., 2005). Annealed importance 
sampling (AIS) is a common estimation algorithm for par¬ 
tition functions with a nice property that unbiased estimates 
are obtained (Neal, 2001; Salakhutdinov & Murray, 2008; 
Grosse et al., 2013). 

One of the principal problems for achieving accurate esti¬ 
mates is the selection of an annealing path and an annealing 
schedule (Gelman & Meng, 1998; Grosse et al., 2013; Neal, 
2001). Though mainstream research has been addressed 
to the selection of an annealing path (Gelman & Meng, 
1998; Grosse et al., 2013), little has been done on the se¬ 
lection of an annealing schedule. Some researchers develop 


2. Models of Interest 

The schemes discussed in this paper cover stochastic mod¬ 
els that assign probabilities 


p(v;e) 


m 


( 1 ) 


to states V S V where 6 are model parameters, and p* is the 
unnormalized probability that can be efficiently evaluated. 
The main interest of this paper is to estimate the partition 
function of such models 


Z(0)4 ^p*(v;e), (2) 

vev 


which is often intractable. 

One example of such models is RBMs. An RBM is a bi¬ 
nary Markov random field with a bipartite graph structure 
that consists of two layers of variables; visible variables 
representing data v G {0,1}^, and hidden variables rep¬ 
resenting latent features h € {0,1}^ (Hinton, 2002). The 
unnormalized probability of an RBM is computed as 

p*(v; 0) = X exp(-E;(h, v; 0)), 

h 
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Algorithm 1 AIS: Annealed Importance Sampling 

Input: annealing schedule number of runs N, 

function / : V —M 
Initialize w^''^ = 1 

for fc = 1 to AT do 

Sample Vg^ frompo(vo ^) 

for * = 1 to A^ do 

Update ^ 

Sample (v^*Vi-i) 

end for 

fk = Ej Ej 

end for 

Compute Zb = Z{9^) /N 

Output: Zb, {fk} 


where RBM energy function is defined as 


E{h.,w;e) 


M D 


i=l i=l 
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(4) 


and the parametes are 0 = {lU, a, b}. 


AIS. The ESS can be estimated as: 


ESS 



(5) 


where is the sample variance of = 

The ESS is approximately inversely 
proportional to the variance of AIS estimates and is reli¬ 
able unless AIS samples are misallocated to major modes 
(Neal, 2001). 

One interesting property of AIS is that the statistics of in¬ 
termediate distributions can be estimated with on-the-fly 
importance weights at any point of annealing (Neal, 2001). 
Eor example, an expectation of some function /(v) with re¬ 
spect to can be estimated as fk as in Algorithm 1 . We 
employ this property to approximate the optimal annealing 
schedule in Section 5. 

To achieve accurate estimates with AIS, we have two prob¬ 
lems to solve: the selection of the Markov transition oper¬ 
ators {Tp^} and the selection of the intermediate distribu¬ 
tions {p/ 3 ^}. As for transition operators, Sohl-Dickstein & 
Culpepper (2012) recently proposed to implement {T^^} 
with Hybrid Monte Carlo for enhanced mixing of Markov 
chains. 


3. Annealed Importance Sampling 


Suppose that we are estimating the partition function 
Zb — Z{9^ ) of an intractable stochastic model pb(v) = 
p{'v;9^) with parameters 9^. A possible way to esti¬ 
mate Zb is to use importance sampling (IS) with some 
tractable distribution pA- By assuming thatpA(v) 7 ^ 0 -4= 
Pb(v) 7 ^ 0 , the partition function can be approximated as: 

Zb = /gg)pAv)dv = Zb 4 gfei- ™s 

Monte Carlo estimate is unbiased if we obtain i.i.d. sam¬ 
ples from PA. However, the variance of estimates can gen¬ 
erally be large unless pA is a close approximation of ps, 
which is not often the case. 


Annealed importance sampling (AIS) eases this problem 
by using a sequence of intermediate distributions {p/ 3 ^,} 
defined with a sequence 0 = /3o < . •. < 13k = 1 
that interpolates between pA and pe, i.e., P/3g=o = Pa 
and P/3^=i = pb (Neal, 2001; Salakhutdinov & Murray, 
2008; Sohl-Dickstein & Culpepper, 2012). AIS alternates 
between importance weight updates and annealed MCMC 
updates as in Algorithm 1 where Tp is an MCMC transition 
operator that renders pp invariant. Note that AIS shares the 
unbiasedness property with IS. Remarkably, unbiasedness 
holds even if MCMC transitions do not return independent 
samples (Neal, 2001). 

As Neal (2001) suggests, the effective sample size (ESS) 
can be an informative measure for estimation accuracy of 


As for intermediate distributions, there has been a long 
history of research (Ogata, 1989; Gelman & Meng, 1998; 
Grosse et al., 2013). The design of intermediate distribu¬ 
tions can be devided to two problems: the selection of an 
annealing path and the selection of an annealing schedule. 
An “annealing path” is a continuous parameterization of 
distributions pp with (3 € [0,1]. Although AIS has its 
origin in statistical physics (Iba, 2001), (3 need not be the 
inverse temperature, and any parameterization is possible. 
The most commonly used annealing path is the geometric 
path (Neal, 2001; 1996; Salakhutdinov Sl Murray, 2008; 
Tieleman, 2008; Dauphin & Bengio, 2013) although it is 
proved to be suboptimal in terms of the estimation accu¬ 
racy (Gelman Sl Meng, 1998). Gelman & Meng (1998) 
analyzed the relation between estimation errors and anneal¬ 
ing paths, and derived the optimal annealing path that min¬ 
imizes the errors. However, the optimal path suggested by 
Gelman & Meng (1998) is often intractable in practical ap¬ 
plications. Grosse et al. (2013) recently proposed the mo¬ 
ment averaging path, which is still suboptimal but can be 
used in practical problems and results in better estimation 
accuracy than the geometric path. 

Compared to annealing paths, little has been done on an¬ 
nealing schedules for AIS. An “annealing schedule” de¬ 
notes a binning or quantization of /3 G [0,1]. Eor the 
tempered transition method, which is deeply related to 
AIS, scheduling techniques for the geometric path are 
studied in terms of the acceptance rate (Behrens et al.. 
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Algorithm 2 VAROPT-AIS 

Input: K, N, K, N 

Let {/3fe} be a iL uniformly spaced sequence of [0,1] 
Estimate {g0k)} using AIS({/3fc}, N). 

Compute {Pk} = DESolve({p(;5fe)}). 

Estimate Zb using AIS({/3fe}, N). 

Output: Zb 



Algorithm 3 Deceleration of schedules 

Input: schedule {/3fe}, maximum delta A/^max, toler¬ 
ance Tol 

Initialize A/3fc = Pk — Pk-i for k = 1,... ,K. 

repeat 

Initialize noChange = true. 
for fc = 1 to AT — 1 do 
if APk > A^max then 
A/3fe •<— A/?inax 

end if 
end for 

Compute Norm = 'Yhk^Pk 
if I Norm — 11 < Tol then 
noChange = false 

end if 

for fc = 1 to AT — 1 do 
APk APk/Norm 

end for 

until noChange is true 

Pk = ^Pk 
Output: {Pk} 


Figure 1. Comparison of annealing schedules for an RBM trained 
on MNIST by using PCD. 


2012; Neal, 1996). However, these techniques are cus¬ 
tomized for the tempered transition and are not suitable for 
AIS. Grosse et al. (2013) recently proposed a scheduling 
technique by formulating the problem as minimization of 
log Zb — E [log w]. In this paper, we develop an alternative 
technique by variational minimization of a functional that 
dominates the variance of estimates i.e., Var [logui], 

4. Estimation Errors and An Annealing 
Schedule 

Eor analyzing AIS, it is useful to assume a perfect tran¬ 
sition condition where Tp returns independent samples of 
the previous ones (Neal, 2001; Grosse et al., 2013). This 
condition is an ideal situation where the mixing of Markov 
chains is very fast. Under this condition, log w can be re¬ 
garded as a summation of K independent random variables 
(log(v) - logp^^ (v)) where v ~ . Therefore, as 

Neal (2001) suggests, logu> approximately follows a nor¬ 
mal distribution with lage AT as a consequence of the cen¬ 
tral limit theorem. The variance of log w can be computed 
as 


K 

Var [log w] = ^ Var^^ 


logA;,+,(v)-logp^Jv) 


, ( 6 ) 


where Var^ denotes the variance w.r.t. pp. 


Because the variance becomes inversely proportional to AT 
as K increases, we analyze the behavior of ATVar [log w] 
for large K. By approximating the difference in the r.h.s. 
of Eq. (6) with a Taylor series up to hst order, we have the 
following approximation 


K 

ATVar [logic] « AT '^{APk)'^Ya.ip^ 
k=0 


A 


logp^(v) 


. (7) 


Assume that annealing schedule {Pk} has a continuous 
limit i.e., Pk = P(k/K) with some smooth function P{t) 
dehned on f S [0, Ij. Because the error caused by the ap¬ 
proximation vanishes under this assumption as AT —>^ oo, 
the scaled variance asymptotically approaches a functional 

jmi 

Theorem 1. Assume perfect transitions. Assume that {Pk} 
are composed as Pk = P{tk) where P{f) is a smooth func¬ 
tion (P{f) € C^) defined on t G [0,1] and tk = k/K. Then 
as K ^ oo the AIS estimation error behaves as: 

iTVar[logu;]^ J(/3(-))= / Ph{P)<^h (8) 

Jo 


where P denotes the derivative of P{t), i.e., and g{P) 
is a function defined as g{j3) = Var^ ^ logp^(v) . [See 
supplementary material for proof. ] 


Here the problem of hnding the optimal schedule that min¬ 
imizes the estimation error is formulated as a variational 
minimization problem of the functional ff (/3(-)) w.r.t. P{-). 
Erom Euler-Lagrange equation (Bishop, 2006), we derived 
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Figure 2. log for tractable RBMs as a function of K. Error bar shows ±3 ct intervals of log w. The black broken lines indicate the 
ground truth. 
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Figure 3. Estimates of the ESS’s plotted as a function of K. 


the following differential equation that the optimal sched¬ 
ule obeys: 

/3+Y^log5(/3) = 0. (9) 

5. Numerical Search for The Optimal 
Schedule 

By numerically solving Eq. (9) with boundary conditions 
/3(0) = 0 and ^(1) = 1, we can find the optimal schedule. 
A problem here is that g(/3) is intractable. To overcome 
this difficulty, we propose VAROPT-AIS algorithm listed 
in Algorithm 2 where we perform AIS twice; p(/3) is esti¬ 
mated by the first (cheap) execution of AIS with the linear 
scheduling, and log Zb is estimated by the second (expen¬ 
sive) execution of AIS with a schedule computed from the 
first execution. A key idea of VAROPT-AIS is to execute 
cheap AIS first to roughly survey the terrain through the 
annealing path i.e., and then execute expensive AIS 

to gain thorough estimation. 

In VAROPT-AIS, we use the method of fixed point iteration 
(Kelley, 1995) to solve the differential equation of Eq. (9) 
(labeled as DESolve in Algorithm 2). Because the l.h.s. of 
Eq. (9) does not directly depend on p(/3), we perform nu¬ 


merical differentiation to approximate ^ logp(/3) as pre¬ 
processing. We also perform convolutional smoothing of 
g{fi) estimates to remove less important noises. 

Because Eq. (9) is derived based on the assumption of 
perfect transitions, the solution of Eq. (9) can have large 
= fik — Pk-i that can impede the mixing of Markov 
chains. This can damage the estimation accuracy of AIS. 
To ease this effect, we optionally decelerate an annealing 
schedule s.t. max Afik < A/3niax with Algorithm 3. This 
heuristic algorithm sequentially clips Afik by A/3niax and 
stretches all Afik to compensate the error caused by clip¬ 
ping. 

6. Remarks 

The methodology developed in this paper can be applied to 
various kinds of stochastic models to which AIS is applica¬ 
ble. Nevertheless, we are mainly interested in RBMs and 
only perform experiments on RBMs in this paper. 

Also note that our method can be combined with various 
kinds of established techniques for AIS. First, the pro¬ 
posed method can be combined with HAIS (Sohl-Dickstein 
& Culpepper, 2012) because the selection of an annealing 
schedule is independent of the implementation of Markov 



































































Variational Optimization of Annealing Schedules 


Table 1. Estimates of the partition functions and the ESS’s for tractable RBMs. The ground truth of the estimate log Zb is also reported. 
All the figures are obtained with K = 100,000 




PCD(20) 


CD 1(20) 


CD25(20) 


schedule 

log Zb 

log Zb 

ESS 

log Zb log Zb 

ESS 

log Zb log Zb 

ESS 

VAROPT 

208.63 

208.629 
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193.951 194.117 

87 

207.12 207.136 

776 

VAROPT0.009 


208.616 

809 
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VAROPT0.006 
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VAROPT0.003 
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797 
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LINEAR 


208.626 
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207.099 

664 

GMS13 
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207.175 
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Figure 4. Evolution of the ESS’s on various K. These plots are computed with PCD(20) 


transitions. Second, the proposed method can be used to 
schedule various types of annealing paths possibly includ¬ 
ing the moment averaging path with geometric interpola¬ 
tion (Grosse et al., 2013). Finally, the proposed method 
will easily be combined with a technique for tuning a pro¬ 
posal distribution (Kiwaki & Aihara, 2014). 

Figure 1 compares an annealing schedule by our method 
(labeled as VAROPT) with those by several others: (LIN¬ 
EAR) the linear schedule that corresponds to /3(f) = f; 
(GMS13) a scheme suggested by Grosse et al. (2013); and 
(SM08) a heuristic schedule suggested by Salakhutdinov 
& Murray (2008). Several interesting points can be seen 
from these plots. First, GMS13 is largely different from 
VAROPT and is rather similar to the linear schedule. This 
clearly shows that our objective J’(/3(-)) is intrinsically dif¬ 
ferent from the objective proposed by Grosse et al. (2013). 
Second, VAROPT is similar to the heuristic schedule by 
Salakhutdinov & Murray (2008). This remarkable coin¬ 
cidence suggests that our proposal possibly automates ex¬ 
pensive heuristic search of annealing schedules with human 
hands. 

7. Experiments 

To demonstrate the benefits of the proposed method, we 
performed partition function estimation for several RBMs 


with various annealing schedules. We evaluated scheduling 
schemes with respect to two measures: ESS estimates and 
log Zb- As Neal (2001) warns, ESS estimates can be mis¬ 
leading if AIS fails to find important major modes of pb, 
and therefore one should be careful when reporting the es¬ 
timates. In our experiments, however, we regard that ESS 
estimates are reliable because the partition function esti¬ 
mates seem reliable in most cases from comparison with 
the ground truth or from comparison with estimates by dif¬ 
ferent schemes. 

RBMs were trained on MNIST by using three training algo¬ 
rithms: (PCD) persistent contrastive divergence (Tieleman, 
2008), (CDl) contrastive divergence (CD) with 1 step of 
state update, and (CD25) CD with 25 steps (Hinton, 2002). 
We label RBMs with the training algorithm and the number 
of hidden units; for example, PCD(500) denotes an RBM 
with 500 hidden units trained by PCD. 

All the executions of AIS followed the geometric path. We 
fixed the number of AIS runs as N = 1, 000 and explored 
various magnitudes of K G 10^^’®!. 

We mainly compared following three scheduling tech¬ 
niques: GMS13, LINEAR, and VAROPT. VAROPT sched¬ 
ules were computed with N = 100 and K — 1, 000. Note 
that the computation required to gain VAROPT schedules 
was not heavy and negligible compared to the cost of the 
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Figure 5. log Zb for intractable RBMs as a function of K. Error bar shows ±3a intervals of log w. 
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Figure 6. ESS estimates for intractable RBMs as a function of K. 


following main execution of AIS. In addition to simple 
VAROPT, we also tested decelerated VAROPT schedules 
with A/3max G {0.003,0.006,0.009}. Decelerated sched¬ 
ules are labels as VAROPTA/3niax- 

GMS13 schedules are determined using 10 different points 
of RBM parameters (knots) on the geometric path. On each 
knot, we estimated the moments of the RBM using 1,000 
Markrov chains with 6,000 updates including 1,000 steps 
of burn-in updates. 

7.1. Experiments with Small Tractable RBMs 

We first show results with RBMs that have only 20 hidden 
units. Note that we can compute the exact value of log Zb 
for these RBMs by summing all the possible states of 
hidden units. For training RBMs, we used a fixed learning 
rate and performed 250,000 parameter updates. 

The results of estimates are summarized in Table 1 . It is re¬ 
markable to note that VAROPT0.009 achieves the highest 
ESS’s for all the RBMs. Estimates of log Zb and the ESS’s 
are plotted as a function of K in Eigs. 2 and 3. Note that the 
proposed method is solely represented by VAROPT0.009 
in these plots. Erom these plots, it can be observed that 
VAROPT0.009 achieves the smallest estimation errors and 
the greatest ESS’s in most of the cases, especially with 


large K. 

To better understand the behavior of VAROPT, we com¬ 
puted ESS estimates with on-the-fly AIS weights as shown 
in Eig. 4. Note that such on-the-fly ESS’s are valid statis¬ 
tics because on-the-fly AIS weights can be used to esti¬ 
mate the statistics of the intermediate distributions. Be¬ 
cause the estimation error is accumulated throughout an¬ 
nealing (as Eq. (6) suggests), monitoring on-the-fly ESS 
estimates helps us to understand the characteristics of an¬ 
nealing schedules. It can be seen from Eig. 4 that VAROPT 
has a steep drop in the ESS’s at very the beginning of the 
annealing. Is is also shown that deceleration effectively re¬ 
laxes this problem to yield higher ESS’s. We understand 
that this sudden drop in the ESS’s is due to poor mixing 
of Markov chains because the drop becomes smaller with 
larger value of K. Therefore, the larger K becomes, the 
better estimation accuracy VAROPT enjoys. 

7.2. Experiments with Intractable RBMs 

We next report estimation on intractable RBMs with 500 
hidden units. RBMs were trained using randomly sampled 
hyperparameters such as the number of training epochs, 
learning rates, and L2 regularization. 

Estimates of log Zb and the ESS’s are plotted in Eigs. 5 
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Table 2. Estimates of the partition functions and the ESS’s for intractable RBMs. All the figures are obtained with K = 100, 000 



PCD(500) 

CD 1(500) 

CD25(500) 

schedule 

log Zb 

ESS 

log Zb 

ESS 

log Zb 

ESS 

VAROPT 

525.55 

726 

676.149 

865 

379.68 

545 

VAROPT0.009 

525.564 

719 

676.158 

868 

379.687 

851 

VAROPT0.006 

525.529 

728 

676.169 

855 

379.682 

852 

VAROPT0.003 

525.548 

661 

676.13 

873 

379.68 

778 

LINEAR 

525.545 

329 

676.167 

872 

379.628 

712 

GMS13 

525.593 

266 

676.17 

840 

379.696 

637 


and 6. Table 2 shows the log Zb and ESS estimates for 
K — 100, 000. The scores by (decelerated) VAROPT here 
look less appealing than for tractable RBMs. Especially, 
(decelerated) VAROPT exhibits large estimation errors for 
small K. This is possibly due to poorer mixing of RBMs 
with a larger number of hidden units. Nevertheless, esti¬ 
mation errors with (decelerated) VAROPT are rapidly re¬ 
duced as K increases. Thus, decelerated VAROPT sched¬ 
ules achieve greater ESS’s than the conventional schedul¬ 
ing schemes for all the RBMs with K = 100, 000 as in 
Table 2. 

8. Conclusion 

We pursued a problem of determining the optimal anneal¬ 
ing schedule for AIS. Assuming perfect transition, we de¬ 
rived a functional that dominates the estimation error and 
formulated the problem as a variational minimization prob¬ 
lem. We developed a numerical scheme to solve this vari¬ 
ational problem and implemented a practical algorithm 
to approximate the optimal annealing schedule. We per¬ 
formed experiments and demonstrated that the proposed al¬ 
gorithm achieved better estimation accuracy than conven¬ 
tional schemes in most cases with a large number of inter¬ 
mediate distributions. 
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A. Derivation of Eq. (7) 

By Taylor series expansion of logp^(v) w.r.t. /3, the vari¬ 
ance Var [log w] can be written as 

K 

Var [log w] = ^ Var^^ 
fc =0 

( 10 ) 

^logp^(v)A^fc-f (5(v,/3fc)0(A/3fe^) , 

( 11 ) 

where we defined A/3fe = fik+i — Pk, and the coefficients 
for the higher order terms are represented by 5(v, /3fe). Be¬ 
cause APk does not depend on v, ATVar [log w] can be fur¬ 
ther rewritten as 


= 


logp;,+i(v)-logp;jv) 


Var [log w] 


K 






fk 


dp 


logP^('' 


5{Pk)0{Apk^) 

( 12 ) 


where 5{Pk) = 2Cov^j^ (5(v,/3fc), ^ logp^(v) with 

Cov^^ being the covariance operator w.r.t. pp. Neglect 
of the second term of the r.h.s. yields Eq. (7). 


B. Proof of Theorem 1 

Theorem 1. Assume perfect transitions. Assume that {Pk} 
are composed as Pk = Pitk) where P(t) is a smooth func¬ 
tion (P{t) G C^) defined ontG [0,1] and tk = k/K. Then 
as K ^ oo the AIS estimation error behaves as: 

ATVar [logw]= [ P"^g{P)dt, (13) 


where P denotes the derivative of P(t), 
is a function defined as g{P) = Yarp 


i.e., and g{P) 


Proof. From Eq. (14), the scaled variance is written as 

K 


KYav [log w] = {KAPkf Yavp^ 


K 


fc =0 




K 


Kj2m)oiAPk^), 


k^O 


The second term of the r.h.s. vanishes if K 


KEto^PkMApk^ 


oo as 


< CKj:t^5{Pk)\APk^\ < 


CC^KY,,^^f^6{Pk)\tk+i-tk\^ = 0{K 1) 0 with 

3C,C > 0. Note that we have \pk+i — Pk\ < 

C\tk+i — tk\ because P{t){G C^) G and |()(/3)| < oo 
because pp is smooth. The scaled variance is dominated by 
the first term of the r.h.s., which have the following limit as 
K ^ oo 


J(/3(-))= / P^Yavp 

Jo 




Therefore, KYav [logic] — J{P{-)). 


dt. (14) 


□ 


C. Derivation of Eq. (9) 

Euler-Lagrange equation for J'j/Sj-)) is 


d PdG 

df Vc)^ 


dG 

1^’ 


(15) 


where G = p^g{P). The l.h.s. is computed as 

ai(2/?5(^)) = ‘^iPgiP) d- %P'^)- The r.h.s. is computed 
as By replacing both sides of Eq. (15) with these 

results, we have P + ^^^P^ = P + ^f^\ogg{P) = d 


2 


0 




















